A generalizable framework for spatially explicit exploration of soil organic carbon sequestration on global marginal land

Land-based CO2 removal demands changes in management or new suitable areas to sustainably grow additional biomass without reducing food supply or damaging natural ecosystems. The soil organic carbon (SOC) sequestration pathway is thought to transfer atmospheric CO2 into a land unit, through plants, plant residues and other organic solids stored as part of the soil organic matter. No previous study explored SOC sequestration potentials on global marginal land. Here we integrated, into a generalizable modelling framework, the mapping of a set of biophysical (climatic and edaphic) and land conservation constraints to (i) identify suitable matches (i.e. biophysically possible combinations) of target areas with plant species, and (ii) to quantify contributions of pairing to long-term SOC sequestration (2020–2100). The proposed framework represents a refinement to previous mapping exercises, which seldom consider biophysical constraints, soil erosion, plant species tolerances to pedoclimatic conditions, and world protected areas. The approach was tested on marginal lands featuring SOC-deficient stocks (≤ 50 Mg SOC ha−1 to 30 cm depth) at 30 arc-sec resolution, consolidated into world regions × global ecological zones based on geo-localised products. The framework was shown to enable better-informed decision-making on interventions at large geographical scales, revealing biophysically realistic options, while management should be determined locally.


Definition of marginal land and target areas
Defining land as "marginal" has proved to be challenging [16][17][18] , with some authors even designating it as a non-viable concept 19 . Originally, the concept related exclusively to the economic agricultural framework 20 , concerning the reduced productive capacity and benefit for a given land use, often linked with rural poverty 21 . The concept further evolved across disciplines and scales 22 , adding biophysical (nature-influenced) and environmental (human-influenced) constraints [23][24][25] , and thus comprising wide-ranging land types: idle, underutilised, unused, barren, inaccessible, degraded, abandoned, fallow or set-aside, wasted, and potentially contaminated (e.g. brownfields, landfills) or reclaimed (e.g. remediated mine land) [17][18][19]26,27 . Yet, it has been criticised that the umbrella term ignores the criticality as a means of subsistence for marginalised communities, small-scale farmers or indigenous people and the resource and infrastructure requirements for its exploitation 19 .
A variety of definitions have been proposed (Table S2).
According to Mellor et al. 18 "the most pronounced problem is related to the variation and ambiguity in its definition or understanding", which has consequently led to methodological inconsistencies. Agricultural (potentially suitable for food production historically, currently or in future) and non-agricultural (unsuitable/unfavourable for food production) land types include the following classifications 18 : • Agricultural land type comprises areas that can potentially become productive, despite current biophysical constraints (e.g. sandy, acid or saline soils, highly erodible, or soils prone to droughts, compaction, floods, and sloppy terrains). It covers degraded (reduced soil fertility and productivity), fallow (temporary suspension as a crop rotation period), abandoned (due to declining yields), reclaimed (from previously unsuitable conditions) and wasted (active dunes, salt flats, rocky outcrops, deserts, ice caps and arid mountain regions) land.
• Non-agricultural land type refers to mine land (abandoned after mineral exploitation), brownfields (previously used but currently not fully used), landfills (waste disposal sites) as well as buffers (including utilities and urban land such as parks, roadsides).
• Both land types represent contaminated land (e.g. with metals, petroleum, aromatic and chlorinated hydrocarbon, organic compounds), which can potentially be used after remediation (e.g. phytoremediation) or restoration and under consideration of safety and environmental measures within the contaminated and surrounded areas.
Degraded land, as recently defined in the IPCC 28 refers to as "a negative trend in land condition, caused by direct or indirect human-induced processes, including anthropogenic climate change, expressed as longterm reduction or loss of at least one of the following: biological productivity, ecological integrity or value to humans" Key marginal land mapping studies have retained slightly different sets of biophysical criteria to identify and map marginal lands (   Table S3). Elbersen et al. 17 identified a set of biophysical, land use management, socio-economic and ecosystem services constraints to map marginal land suitable for industrial crops in Europe in the context of the EU H2020 MAGIC project. The biophysical (i.e. natural) criteria were retained, following an approach by the Joint Research Centre 29 : adverse climate (low temperature, dryness), excessive wetness (excess soil moisture, limited soil drainage), adverse chemical composition (salinity, sodicity, natural toxicity, toxicity by pollutants), low soil fertility (pH, SOC), limitations in rooting (unfavourable soil texture, coarse fragments, organic soils, surface rockiness, shallow rooting depth), adverse terrain conditions (steep slope, flooding risk). An assessment of biomass resources from marginal lands in Asia-Pacific Economic Cooperation economies 24 retained terrain (slope) constraints and soil problems. The latter are roughly equivalent to MAGIC's "limitations in rooting" group of constraints and FAO's classification of problem soils/degraded lands 30 .
A key component of marginal lands is abandoned agricultural land, which in our definition (see main article) corresponds to recent conversion of agricultural land to mosaic cropland/natural vegetation (complemented with mosaic cropland/natural vegetation to semi-natural), grasslands, sparse vegetation, bare areas, mosaic herbaceous cover or shrubland. Land cover classes corresponding to FAO Land Cover Classification System (LCCS3) 4 are listed in Table S4.
To define target areas, as discussed in the main article, all marginal lands within the same GEZ and geopolitical world region (listed in ) were consolidated and their values averaged, as previously done for global assessments requiring characterisation of larger regions with data at a finer granularity (e.g. 31 ).

Marginal rent
The poorest lands utilized above the margin of rent-paying land with respect to the next lower purpose. 34 Marginal land Limitations which in aggregate are severe for sustained application of a given use. Increased inputs to maintain productivity or benefits will be only marginally justified. Limited options for diversification without the use of inputs. With inappropriate management, risks of irreversible degradation. 21 Marginal land Depends on the interaction of physical, environmental, social and economic aspects. Implies that abandonment can occur everywhere, even in areas with a high yield potential, and even in a satisfying general economic situation.
Set-aside, abandonment. Land uses that are at the margin of economic viability. 35 Marginal land Limited productive or regulatory function Degraded land 36 Abandoned agricultural lands Land that have been abandoned to crop and pasture due to the relocation of agriculture and due to degradation from intensive use. Marginal land Relatively poor natural condition but is able grow energy plants, or land that currently is not used for agricultural production but can grow certain plants.
Woodland (shrub land, sparse forest land), grassland and barren land (including shoal/bottomland, saline and alkaline land, and bare land). Shrub, high/moderate grassland cover excluded due to eco-environmental security. China, regional Cassavabioethanol Marginal land Unsuitable for crop production, but ideal for the growth of energy plants with high stress resistance. These lands include barren mountains, barren lands and alkaline lands

Harmonisation of climate zones
Adaptation of climate zone classes used in ECOCROP 13 based on the Köppen climate classification 59 with Global Ecological Zoning (GEZ) framework 10 , updated for 2010, considering the separate classification of "mountain system" (see definition) in GEZ based on Köppen-Trewartha 59,60 , due to high variations of both vegetation formations and climatic conditions 10 .
Criteria described in 10 (

Biopump selection and ranking
The pre-selection of potential biopumps was based on a semi-quantitative analysis. Table S7 shows the criteria considered for the scoring and ranking procedure and main sources of data and information.
The first criterion quantified two main attributes considering annual SOC stock changes [t C ha -1 yr -1 ] and belowground C input fraction [t C ha -1 ]. SOC changes were computed from 58 considering land transformation from fallow, short-rotation coppice, crop-, grass-, and forest land to perennial crops; for both top-(≤30 cm) and sub-(>30 cm) soils per tropical, subtropical and temperate climate zones (here the reported boreal zone was linked to temperate and the arid and Mediterranean zones to subtropical zones). Belowground root C allocation was based on the plant fractioning and carbon partitioning approach 61 64 . It has been suggested that C inputs to the soil may provide a more robust estimate than a fixed shoot:root ratio 65 . Moreover, about half of the C assimilated by plants is transferred to the soil 66 .
The second criterion quantified the productivity in terms of mean, min and max yields [t ha -1 yr -1 ] expressed in dry mass 62 . Data for agricultural crops were retrieved from FAOSTAT 14 for the years 2010 to 2018, corresponding to values from all known regions and the global mean. For lignocellulosic crops, data were retrieved from Li et al. 15 , mostly experimental data over several consecutive years. For the remaining innovative crops, data were retrieved from various peer-reviewed sources.
The third criterion qualified marginal land suitability 67 . Species with high abiotic stress tolerances (e.g. to droughts, frost, sandy soils, etc.) and other relevant features associated with marginal land (e.g. phytoremediation properties, low input) were scored higher.
We evaluated the biopumps by re-scaling quantitative data, assigning scores, weighting, standardising, and ranking (Table S7). Re-scaling was necessary to obtain a common numerical scale by normalising the values between zero and one [0;1] based on the Min-Max scalar, where the range of the values change but the shape of the data is conserved. The values were then scored in ascending order: very low [0], low [1], moderate [2], good [3], and high [4]. Next, the scores were weighted based on the arithmetic weighted mean followed by a statistical standardisation via the z-score. Finally, values with negative standard deviation (i.e. all scoring below the mean) were excluded, and all positive ones ranked with the best observation close to the maximum.

Selection of an adapted soil carbon model
The selection of a model for predicting soil carbon sequestration (SCS) is not straightforward, as no single one clearly outperform the others 70 and multi-model comparisons have not been conclusive on a particular model 71 . The number of models describing biogeochemical processes in the soil has increased considerably since the 1930s to more than 250 distinctive ones 72 . A minor subset of available models is widely used, where the most cited ones are Century, RothC, DNDC, EPIC and DSSAT 71 . Soil models generally differ vis-àvis model structure (from simple mineralisation to integrating the soil-plant dynamic and multiple flow exchanges), number of conceptual C pools (most comprising 2-5 pools), as well as spatial (from soil aggregates to landscape applications) and temporal (hour to centuries) resolutions. Most models include soil organic matter (SOM) dynamics. The mathematical formalism for SOM decay proposed by Hénin and Dupuis 73 is implemented in most models. It follows a simple first order differential equation with constant rates as a function of time, which is controlled by a variety of external climatic and edaphic factors (e.g. temperature, moisture, pH, texture and clay mineralogy), as well as land use and land management practices 74,75 . A comparison of commonly used SOC models is presented in Table S8.
To choose a model for the proposed framework, we followed the rating criteria presented in Köck et al. 76 for Tier 3 GHG inventory reporting 77 and the technical guidelines for spatially explicit modelling of SCS and mapping by the FAO 78 . An essential criterion is the model capacity to represent carbon dynamics at a wide range of spatial and temporal resolutions, which basically segregates the models into "types" 1 and 2 79 .
The former model SOM dynamics with "no dynamic vegetation component" 72 , as the C inputs are based on simple allometric relations 74 , which requires less inputs and predicts the net SOC change at lower level of temporal resolutions. The latter belong to the (agro-)ecosystem models, and represent a large phase-space dimension 72 determined by a number of sub-models, parameters and measurements at high temporal resolutions. Our selection focused on type 1 models, as a high-level resolution was not deemed necessary for long-term simulations at regional scales.
Further criteria were considered: land use category (at least crop and grassland at different altitudes), soil type (excluding organic soils), soil depth (mainly topsoil), management practices (e.g. external C inputs from fertilisation and amendments). Models fulfilling most of the retained criteria were RothC and C-tool. The overall performance of these models, as compared to that of type 2 ones, has been shown to be good. C-tool showed similar C and N interactions when compared to DAISY 79 , while RothC produced similar results as Century 80,81 .
The Rothamsted C model, RothC 82,83 , computes change in SOM from known C inputs 84 . It uses a monthly time step and subdivides the soil into five conceptual SOM pools: decomposable plant material (DPM), resistant plant material (RPM), microbial biomass (BIO), humified organic matter (HUM)) and inert organic matter (IOM). C inputs are first allocated to DPM (fast turnover) and RPM (slow turnover) based on the DPM:RPM ratio determined by the quality and distribution of plant input throughout one year, yet the distribution is insensitive to long-term C inputs, which makes the model applicable globally 85 . The decay process depends on soil clay content [%], average monthly temperature [°C], precipitation and evapotranspiration [mm], land cover and management, soil depth [cm] and annual C inputs [t C ha -1 ] from residues and/or exogenous organic matter (e.g. manure). C inputs specific to each pool (except for IOM) are described by a rate constant parametrised for grassland, crop and forest land. RothC has been used in a wide range of climates and regions of the world (more than 80 countries) in combination with GIS products [85][86][87] , and is currently recommended as a standardised spatialised SOC model for national comparisons at a 30 arcsec resolution 78 . The latest version is RothC v26.3 84 , but a series of versions (e.g. RothPC-1 to simulate andosols subsoil C 88,89 , RothC10_N for dry soils in arid and semiarid regions 90 ) and methods (e.g. initialisation without historic data for wide ranging soil conditions 91 ) have been developed. Main persisting limitations of the model include permanent waterlogged soils and organic soils 90 .

RothC initialisation
To initialize the RothC model, we applied the pedotransfer functions for the three active RothC pools by Weihermüller et al. 121  where SOC and clay are expressed in t ha -1 and % respectively. The IOM function in absence of radiocarbon data based on clay content and/or SOC. The DPM:RPM ratio for C inputs from plant residues per crop, grass, and tree cover was set at 1. 44

SOC erosion
Eroded soil organic carbon (SOCeroded) [Mg SOC ha -1 yr -1 ] is calculated following the method in 123  where SOC, bulk density and depth are available from the Harmonized World Soil Database (HWSD) 6 , ER (enrichment factor) is set to 1 and CF (cover-management factor) is a species-dependent factor listed in Table S9. For reference, annual averaged soil loss by erosion (E) [t ha -1 yr -1 ] was computed in the input data source (GloSEM v1.1) with the RUSLE2015 equation 125 , as modified from the original RUSLE 126 (Eq. 6).